-
Notifications
You must be signed in to change notification settings - Fork 205
OCPBUGS-62858: pkg/cvo/metrics: Add a new --serving-client-certificate-authorities-file option #1240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@wking: This pull request references Jira Issue OCPBUGS-62858, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
34b5010
to
f4f37d7
Compare
…ile option In 313f8fb (CVO protects /metrics with authorization, 2025-07-22, openshift#1215) and 833a491 (CVO protects /metrics with authorization, 2025-07-22, openshift#1215), the /metrics endpoint began requiring client auth. The only authentication system was Bearer tokens, and the only authorization system was validating that the token belonged to system:serviceaccount:openshift-monitoring:prometheus-k8s. That worked well for standalone clusters, where the ServiceMonitor scraper is the Prometheus from the openshift-monitoring namespace. But it broke scraping on HyperShift [1], where the ServiceMonitor does not request any client authorization [2]. Getting ServiceAccount tokens (and keeping them fresh [3]) from the hosted cluster into a Prometheus scraper running on the management cluster is hard. But for other ServiceMonitors, HyperShift configures keySecret [4] asking the scraper to get its client certificate from a metrics-client Secret that the HostedControlPlane controller maintains in the management cluster namespace [5]. This commit adds a new --serving-client-certificate-authorities-file option, which HyperShift can set when it configures the CVO Deployment [6], while mounting the root CA ConfigMap into the Pod. Now that there are three files (serving key, serving cert, client CAs) that we might be watching for changes, I've taken the opportunity to refactor the checksumming and change-tracking to use a map from filename to checksum. That way we can extend more easily if further configuration files are added in the future, without having to pass around a series of paths and checksums as distinct arguments. I'm a bit sad about AppendCertsFromPEM only returning a boolean [7], leaving us unsure about whether all the certificates in the file were parsed, or, if there were parsing issues, what those issues were. But with HyperShift hopefully reliably setting up CA Secrets that do not cause parsing issues, I'm ok not coding that defensively. If the standard library grows a parser that is more transparent about parsing issues, we can pivot to that once it exists. [1]: https://issues.redhat.com/browse/OCPBUGS-62858 [2]: https://github.com/openshift/hypershift/blob/2728a4e2899ee58690e3d4e58b319eb5e78643af/control-plane-operator/controllers/hostedcontrolplane/v2/assets/cluster-version-operator/servicemonitor.yaml#L18 [3]: https://kubernetes.io/docs/concepts/configuration/secret/#serviceaccount-token-secrets [4]: https://github.com/openshift/hypershift/blob/2728a4e2899ee58690e3d4e58b319eb5e78643af/control-plane-operator/controllers/hostedcontrolplane/v2/assets/kube-apiserver/servicemonitor.yaml#L24-L26 [5]: https://github.com/openshift/hypershift/blob/2728a4e2899ee58690e3d4e58b319eb5e78643af/control-plane-operator/controllers/hostedcontrolplane/pki/sa.go#L35-L36 [6]: https://github.com/openshift/hypershift/blob/2728a4e2899ee58690e3d4e58b319eb5e78643af/control-plane-operator/controllers/hostedcontrolplane/v2/assets/cluster-version-operator/deployment.yaml#L25-L35 [7]: https://pkg.go.dev/crypto/x509#CertPool.AppendCertsFromPEM
f4f37d7
to
d7ca459
Compare
@wking: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic looks solid 👍 I have no objections regarding the core of the PR.
A few nits and notes.
} | ||
|
||
func (a *authHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: empty line
// If no errors occur, returns true if both files have changed and | ||
// neither is an empty file. Otherwise returns false and any error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description is now outdated - return values, parameters.
for fName := range checksums { | ||
err := requireFileExistsAndNotEmpty(fName) | ||
if err != nil { | ||
return false, nil, fmt.Errorf("Server configuration file must exist and be non-empty. Certificates will not be rotated. %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I am not sure whether the error will always contain the filename.
return false, nil, fmt.Errorf("Server configuration file must exist and be non-empty. Certificates will not be rotated. %w", err) | |
return false, nil, fmt.Errorf("Server configuration file %s must exist and be non-empty. Certificates will not be rotated. %w", fName, err) |
// metrics serving options | ||
cmd.PersistentFlags().StringVar(&opts.ServingCertFile, "serving-cert-file", opts.ServingCertFile, "The X.509 certificate file for serving metrics over HTTPS. You must set both --serving-cert-file and --serving-key-file unless you set --listen empty.") | ||
cmd.PersistentFlags().StringVar(&opts.ServingKeyFile, "serving-key-file", opts.ServingKeyFile, "The X.509 key file for serving metrics over HTTPS. You must set both --serving-cert-file and --serving-key-file unless you set --listen empty.") | ||
cmd.PersistentFlags().StringVar(&opts.ServingClientCAFile, "serving-client-certificate-authorities-file", opts.ServingClientCAFile, "The X.509 certificates file containing a series of PEM encoded certificates for Certificate Authorities trusted to sign metrics client certificates. If set, only clients who provide mTLS client certs signed by those CAs will be trusted. If unset, only clients with valid tokens for the prometheus-k8s ServiceAccount in the openshift-monitoring namespace will be trusted.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently have a --metrics-ca-bundle-file
flag. For consistency, maybe we can rename the new flag to something like --serving-client-ca-bundle-file
(if my terminology is correct). Not that important, but any potential renaming of flags is not fun after they are integrated into other systems. I will not block on this.
In #1215, the
/metrics
endpoint began requiring client auth. The only authentication system was Bearer tokens, and the only authorization system was validating that the token belonged tosystem:serviceaccount:openshift-monitoring:prometheus-k8s.
That worked well for standalone clusters, where the ServiceMonitor scraper is the Prometheus from the
openshift-monitoring
namespace. But it broke scraping on HyperShift, where the ServiceMonitor does not request any client authorization. Getting ServiceAccount tokens (and keeping them fresh) from the hosted cluster into a Prometheus scraper running on the management cluster is hard. But for other ServiceMonitors, HyperShift configureskeySecret
asking the scraper to get its client certificate from ametrics-client
Secret that the HostedControlPlane controller maintains in the management cluster namespace.This commit adds a new
--serving-client-certificate-authorities-file
option, which HyperShift can set when it configures the CVO Deployment, while mounting the root CA ConfigMap into the Pod.Now that there are three files (serving key, serving cert, client CAs) that we might be watching for changes, I've taken the opportunity to refactor the checksumming and change-tracking to use a map from filename to checksum. That way we can extend more easily if further configuration files are added in the future, without having to pass around a series of paths and checksums as distinct arguments.
I'm a bit sad about
AppendCertsFromPEM
only returning a boolean, leaving us unsure about whether all the certificates in the file were parsed, or, if there were parsing issues, what those issues were. But with HyperShift hopefully reliably setting up CA Secrets that do not cause parsing issues, I'm ok not coding that defensively. If the standard library grows a parser that is more transparent about parsing issues, we can pivot to that once it exists.